Online learning in MDPs with side information
نویسندگان
چکیده
We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side information into account. We propose a computationally efficient algorithm and show that its regret is at most O( √ T ), where T is the number of rounds. To best of our knowledge, this is the first regret bound for this setting.
منابع مشابه
Online Linear Regression and Its Application to Model-Based Reinforcement Learning
We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a model-based approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernalized) linearly parameterized dynamics. This result builds on Kearns and Singh’s work that provides ...
متن کاملUser’s Interaction with Information through eFront Learning Management System
Background and Aim: In order to comprehension of interactive content and content production standards, and also users interaction with LMSs, and their behavior in dealing with information, the aim of this paper is to examine the users interaction information provided in the eFront application, an open source Learning Management System, by emphasizing SCORM standard. Method: The method that used...
متن کاملFacilitating Internalization in E-Learning Through New Information System
This paper aims to study Vygotsky’s (1987) sociocultural theory of learning with respect to how it relates to technology-based second language learning and teaching. The researchers selected their participants from advanced students from Payame Noor University. We divided the participants into two groups- an experimental group and a control group. After teaching the course an experimental group...
متن کاملMarkov Decision Processes with Continuous Side Information
We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the...
متن کاملIncremental Structure Learning in Factored MDPs with Continuous States and Actions
Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1406.6812 شماره
صفحات -
تاریخ انتشار 2014